Skip to content

GH-1142: Add VectorOps utility and fix addVector/removeVector with safe shared semantics#1149

Open
John-W-Lewis wants to merge 2 commits into
apache:mainfrom
John-W-Lewis:feature/vector-retain
Open

GH-1142: Add VectorOps utility and fix addVector/removeVector with safe shared semantics#1149
John-W-Lewis wants to merge 2 commits into
apache:mainfrom
John-W-Lewis:feature/vector-retain

Conversation

@John-W-Lewis
Copy link
Copy Markdown

Summary

Introduces VectorOps, a new utility class in org.apache.arrow.vector.util that provides three generic whole-vector operations:

  • shareCopy -- creates a new vector sharing the same underlying memory allocations via reference counting. Both source and result remain usable; memory is released only when all sharing vectors are closed.
  • transferCopy -- creates a new vector by transferring buffer ownership. The source is left empty and can be reused via allocateNew().
  • deepCopy -- creates a fully independent clone with its own buffer allocations.

These operations work generically across all vector types via getFieldBuffers()/loadFieldBuffers(), requiring no per-type implementation -- unlike TransferPair, which must be implemented by every vector type. VectorOps can replace TransferPair for whole-vector operations; TransferPair remains necessary for sub-range splitting/slicing (splitAndTransfer).

This PR also uses shareCopy to fix the unsafe shared-reference bug in VectorSchemaRoot.addVector() and removeVector() (see #1142). The original implementation shared raw object references between source and result roots, meaning closing one would invalidate the other. The fix preserves the original intended semantics (both roots remain readable) while making it safe through proper reference counting.

Closes #1142

Test plan

  • TestVectorOps: 12 tests covering all three operations on IntVector, VarCharVector, VectorSchemaRoot, with/without explicit allocator, and source-closed-before-shared-copy semantics
  • TestVectorSchemaRoot: Updated existing tests + 2 new ownership tests verifying that closing the source root after addVector/removeVector does not affect the result
  • Spotless formatting passes
  • All tests pass with JDK 21

Made with Cursor

John-W-Lewis and others added 2 commits May 14, 2026 12:20
Provides shareCopy (shared memory), transferCopy (move ownership), and deepCopy (independent clone) for both FieldVector and VectorSchemaRoot, implemented purely via getFieldBuffers/loadFieldBuffers without depending on TransferPair.
…semantics

Use the new VectorOps.shareCopy to fix the unsafe shared-reference bug in
addVector/removeVector while preserving the original intended semantics:
both source and result roots remain usable with the same data, and memory
is only released when all sharing roots are closed.

Also applies Spotless formatting fixes.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

Thank you for opening a pull request!

Please label the PR with one or more of:

  • bug-fix
  • chore
  • dependencies
  • documentation
  • enhancement

Also, add the 'breaking-change' label if appropriate.

See CONTRIBUTING.md for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Managing ownership in VectorSchemaRoot#addVector, recent changes miss the main fault.

1 participant